SSDB: Sequence Similarity Database in KEGG
نویسندگان
چکیده
Availability of a large number of complete genomes enables us to compare several genomes and to search common and different features between genomes in terms of protein sequence similarities, which we call comparative genomics. It produces information about proteins useful for the assignment of the function to genes and for the research on the evolution of the genome. The large number of genes accumulated in the databases of complete genomes, however, has become a bottleneck, because the computation of the sequence similarity of all pairs of proteins is time consuming even if we use a supercomputer. Therefore precomputed sequence similarities of completely sequenced organisms are indispensable for comparative genomics. SSDB (Sequence Similarity Database) is a new addition to the KEGG suite of databases [3] and contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, together with the information about best hits and bidirectional best hits (best-best hits). The relation of gene x in genome A and gene y in genome B is called bidirectional best hits, when x is the best hit of query y against all genes in A and vice versa, and it is often used as an operational definition of ortholog. We report here the system design and simple search capabilities of SSDB.
منابع مشابه
Classification of Protein Sequences into Paralog and Ortholog Clusters Using Sequence Similarity Profiles of KEGG/SSDB
We are constructing KEGG/OC (Ortholog Clusters) from KEGG/SSDB (Sequence Similarity DataBase) [2]. KEGG/SSDB contains exhaustive protein sequence similarity scores of completed and nearly completed genomes calculated by the SSEARCH program [3]. KEGG/OC is constructed automatically from the graph analysis of searching cliques with an appropriate definition for the profiles of similarity scores. ...
متن کاملAutomatic generation of KEGG OC (Ortholog Cluster) and its assignment to draft genomes
As the number of sequenced genomes are rapidly growing, a method for automatic generation of orthologous gene clusters is needed. However, it is computationally hard to cluster a large number of genes at once. To address this problem, we have developed a heuristic method to assign gene groups from closely related organisms to an ortholog cluster in a bottom-up approach. In this method, we consi...
متن کاملThe KEGG databases at GenomeNet
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and comple...
متن کاملIdentification of Ortholog Groups in KEGG/SSDB by Considering Domain Structures
Huge amount of genome information is stored in databases with the advent of recent genome projects. Although we can effectively predict protein sequences from these genomes, functions of most proteins are not experimentally determined. Therefore computational methods are most important for the function prediction, based on comparison and clustering of protein sequences. However, complications a...
متن کاملAdditional File 2 – the Biological Support of the Gene Regulations for Yeast Cell Cycling. Knowledge Databases Kegg Database Sgd and Cygd Databases Results and Discussions
KEGG database The KEGG [1, 2] is a suite of databases and associated software that integrates current knowledge on molecular interaction networks in biological processes (PATHWAY database), the information about the universe of genes and proteins (GENES/SSDB/KO databases), and the information about the universe of chemical compounds, drugs and their biochemical reactions (COMPOUND/DRUG/GLYCAN/R...
متن کامل